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ABSTEIACT 
Given   a    text    of    length     n      and      a      pattern,      we      present      a      parallel      linear 
algorithm     for      finding   all   occurrences   of    the    pattern   in    the    text.      The   algorithm 
runs      in     0(n/p)      time      using      any      number      of      p     <      n/log      n      processors      on        a 
concurrent-read   concurrent-write    parallel    random-access-machine. 
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I.   Introdactlon 

The  family  of  models  of  computation  used  in  this  paper  Is  the  parallel 
random-access-machines  (PRAMs).  All  members  of  this  family  employ  p  synchronous 
processors  all  having  access  to  a  common  memory.  The  present  papers  refers  to  two 
member  of  the  PRAM  family.  Our  presentation  focuses  on  the  concurrent-read 
concurrent-write  (CRCW)  PEIAM.  This  model  allows  simultaneous  reading  from  the 
same  memory  location  as  well  as  simultaneous  writing.  In  the  latter  case,  the 
smallest  serial  numbered  among  the  processors  that  attempt  to  write  succeeds.  At 
the  end  of  the  paper  we  show  that  a  weaker  concurrent-read  concurrent-write  PRAM 
model,  where  several  processors  may  attempt  to  write  at  the  same  memory  location 
only  if  they  seek  to  write  the  same  thing,  actually  suffices  for  the  strongest 
results  in  this  paper.  There,  we  also  show  how  to  implement  some  of  the  results 
on  a  concurrent-read  exclusive-write  (CREW)  PRAM,  where  simultaneous  reading  into 
the  same  memory  location  but  not  simultaneous  writing  is  allowed.  See  [Vi-83a] 
for  a  recent  survey  of  results  concerning  the  PRAM  family. 

Let  Seq(n)  be  the  fastest  known  worst-case  running  time  of  a  sequential 
algorithm,  where  n  is  the  length  of  the  input  for  the  problem  being  considered. 
Obviously,  the  best  upper  bound  on  the  parallel  time  achievable  using  p  processors 
without  improving  the  sequential  result  is  of  the  form  0(Seq(n)/p).  A  parallel 
algorithm  that  achieves  this  running  time  is  said  to  have  optimal  speed-up  or  more 
simply  to  be  optimal.  A  goal,  in  serial  computation,  is  to  design  linear  time 
algorithms  (0(n)  time).  Analogously,  a  goal,  in  parallel  computation,  is  to 
design  Algorithms  whose  running  time  is  proportional  to  n/p  ,  where  p  is  the 
number  of  processors  used.  In  this  case  we  say  that  a  parallel  algorithm  achieves 
parallel  linear  running  time. 

The  list  of  optimal  speed-up  parallel  algorithms  obtained  so  far  is  short  In 
spite  of  the  interest  in  them.  Let  us  first  mention  the  few  known  parallel  linear 
algorithms:  computation  of  partial  (prefix)  "sums"  of  n  variables,  where  the  word 
"sum"  stands  for  any  associative  binary  operation,  [SV-81]  for  finding  the  maximum 
among  n  elements  and  merging,  [Vi-83b]  for  finding  the  k  smallest  out  of  n 
elements,  [Vi-84]  for  ranking  a  linked  list  (a  randomized  algorithm),  [G-84]  for 
string  matching  (where  the  symbols  are  taken  from  an  alphabet  whose  size  is 
bounded),  [CLC-81]  and  [Vi-81]  for  computing  connected  components  of  dense  graphs, 
[TC-84]  and  [TV-83]  for  computing  biconnected  components  of  dense  graphs  and 
[BV-84  ]  for  generation  of  a  computation  tree  form  of  an  arithmetic  expression  and 
for  finding  matches  in  a  sequence  of  parentheses.   In  addition,  there  are   optimal 
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speed-up  algorithms  for  two  more  problems:  [AKS-83],  [RV-83]  (a  randomized 
algorithm)  and  [SV-81 ]  for  sorting  and  [PVW-83]  for  various  operations  on  2-3 
tree. 

The  string  matching  problem  is  defined  as  follows.  Input .  Two  arrays 
PATTERN  and  TEXT  whose  lengths  are  m  and  n,  respectively.  Output.  A  Boolean 
array  MATCH  of  length  n.  HATCH(i),  1  <  i  <  n,  indicates  if  an  occurrence  of 
PATTERN  starts  at  TEXT(i). 

The  main  contribution  of  this  paper  is  in  presenting  an  original  parallel 
linear  algorithm  for  the  general  case  of  this  problem  which  runs  in  O(log  n)  time 
on  a  CRCW  PRAM.  The  text  analysis  part  of  the  algorithm  achieves  this  efficincy 
on  a  CREW  PRAM.  The  algorithm  can  also  be  implemented  as  a  parallel  linear 
algorithm  which  runs  in  time  O(log-n)  on  a  CREW  PRAM. 

There  are  two  known  linear  time  serial  algorithms  for  this  extensively 
studied  problem,  due  to  [BM-77]  and  [KMP-77].  Recall  that  every  parallel  linear 
algorithms  is,  in  particular,  a  linear  time  serial  algorithm.  The  present  result 
is  stronger  than  theirs  in  the  sense  that  it  gives  a  parallel  linear  algorithm 
while  theirs  serial  algorithms  do  not  seem  to  imply  satisfactory  parallel  linear 
algorithms.  Moreover,  our  algorithm  is  not  more  complicated  than  theirs.  Some 
parts  of  it  (particularly,  the  analysis  of  the  text)  are  even  considerably 
simpler. 

The  string  matching  algorithm  of  [G-84]  also  runs  in  O(log  n)  time  using 
n/log  n  processors  but  requires  the  size  of  the  alphabet  to  be  fixed.  However,  it 
needs  n  processors  in  order  to  obtain  O(log  n)  time  for  the  general  case 
considered  here,  and  simulating  it  by  a  single  processors  takes  O(nlog  n)  time. 
Unlike  his  algorithm,  ours  does  not  use  the  "Four  Russians  Trick"  ([AHU-74]). 
There,  O(log  n)  bij:s  are  packed  into  a  single  register  and  then  each  instruction 
concerning  this  register  is  counted  as  one  operation.  We  use  a  few  ideas  from 
Gain's  paper  but  are  able  to  improve  his  result  due  to  the  following: 
(1)  Novel  algorithmic  ideas  for  the  string  matching  problem.  We  sketch  briefly 
one  such  notable  idea.  A  formal  presentation  of  this  idea  is  given  in  Section  3. 
The  pattern  is  preanalyzed  and  the  following  table  is  constructed.  Consider  the 
following  proposition:  "The  suffix  starting  at  position  i  of  the  pattern  is  a 
prefix  of  the  pattern".  For  each  i,  1  <  i  <  m,  the  table  will  either  Indicate 
that  the  proposition  is  true,  or  point  to  a  single  charachter  following  i  that 
provides  a  counter  example  to  the  proposition.   Let  j^  >  J2  be   two   locations   of 
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Che  cext  such  that  ji-jo  <  ^  ^nd  the  suffix  starting  at  position  Ji-jo+l  of  the 
pattern  Is  not  a  prefix  of  the  pattern.  Following  the  analysis  of  the  pattern, 
position  Ji~J2'*'^  °f  th^  table  points  to  a  counter  example,  say  w.  That  is, 
PATTERN  [j^-J2-fv]  +  PATTERN[w].  The_  duel  idea.  (See  also  Fig.  1).  It  is 
impossible  that  occurrences  of  the  pattern  start  both  at  location  ji  and  jo  of  the 
text.  Moreover,  the  ji-l-i-w  position  of  the  text  can  be  either  the  w  position  of 
the  pattern,  the  j^-J2+w  position  of  the  pattern  or  neither  (but  not  both).  The 
idea  of  a  duel  between  j^  and  J2  is  to  compare  this  position  of  the  text  with  each 
of  these  positions  of  the  pattern.  Thereby,  we  can  eliminate  the  possibility  that 
an  occurrence  of  the  pattern  starts  in  at  least  one  of  j  |^  or  J2» 

Now,  consider  the  set  of  locations  of  the  text  such  that  at  time  t  of  an  algorithm 
the  possibility  that  an  occurrence  of  the  pattern  starts  at  each  of  them  has  not 
(yet)  been  ruled  out.  Applying  duels  between  successive  pairs  of  these  locations 
enable  us  to  decrease  by  a  factor  of  two  a  bound  on  the  cardinality  ("density")  of 
this  set. 
(2)  A  careful  assignment  of  processors  to  their  jobs  (using  Brent's  theorem). 

The  text  analysis  part  of  the  algorithm  is  described  in  Section  3  and  the 
pattern  analysis  part  in  Section  4. 

II.  Preliminaries 

Most  of  this  section  is  devoted  to  definitions  and  known  facts  regarding 
periodicity  in  strings. 

Let  u,w  be  two  strings.  u  is  a  period  of  w  if  w  is  a  prefix  of  u  for  some 
k,  or  equivalently  if  w  is  a  prefix  of  uw.  We  call  the  shortest  period  of  a 
string  w  the  period  of  w.  w  has  period  size  P  if  the  length  of  the  period  of  w  is 
P.  If  w  is  at  least;  twice  longer  than  its  period  we  say  that  w  is  periodic. 

We  will  use  some  simple  facts  about  periodicities. 

Proposition  1.  Let  v  be  a  periodic  string  and  let  w,  |w|  <  |v|/2,  be  a 
period  of  v.  Suppose  w  itself  is  periodic  and  u  is  a  period  of  w  such  that  w=u  , 
k  >  1.   Then  u  is  a  period  of  v. 


The  following  notation  is  used  in  this  paper:  Let  x  be  a  real  number.  |x|  is  the 
smallest  integer  which  is  >  x.  [x]  is  the  largest  integer  which  is  <  x.  Let  u  be  a 
string.   |u|  is  the  number  of  characters  in  the  string. 
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Proof.   V  is  a  prefix  of  w^  for  some  s  >  1.   Hence,  v  Is  a  prefix  of  u   . 

Proposition  2  (The  Periodicity  Lemma  [LS-62]):  If  w  has  two  periods  of  size  P 
and  Q  and  |w|  ^  P  +  Q,  then  w  has  a  period  of  size  gcd(P,Q). 

In  the  rest  of  this  section  an  occurrence  of  some  pattern  at  j  will  mean  that 
the  pattern  is  a  substring  beginning  at  position  j  of  a  given  fixed  string  z.  For 
proofs  of  propositions  3-6  below,  see  [G-84]. 

Proposition  3:  If  v  occurs  at  j  and  j  +  P,  for  any  P  <  |vl/2,  then  (1)  v  is 
periodic  with  a  period  of  length  P,  and  (2)  v  occurs  at  j  +P,  where  P  is  the 
period  size  of  v. 

In  the  rest  of  this  section  we  consider  a  periodic  string  v  =  u  u',  k  >  I ,  u 
the  period  of  v,  u'  a  proper  prefix  of  u,  and  |u|  =  P. 

Proposition  4:  If  v  occurs  at  j  and  j  +  mP,  m  <  k,  then  u   nj'  occurs  at  j. 

Proposition  5:  If  v  occurs  at  j  and  j  +  A,  A  <  |v|  -  P,  then  A  is  a  multiple 
of  P. 

We  call  an  occurrence  of  v  at  j  important  if  v  does  not  occur  at  j  +  P. 

Proposition  6:  If  there  are  two  important  occurrences  of  v  at  r  and  s,  r  >  s, 
then  r  -  s  >  [v]  -  P. 

Theorem  (Brent).  Any  synchronous  parallel  algorithm  of  time  t  that  consists 
of  a  total  of  x  elementary  operations  can  be  implemented  by  p  processors  within  a 
time  of  |x/p|  +  t  . 

Proof  of  Brent's  theorem.  Let  x^  denote  the  number  of  operations  performed 
by  the  algorithm  in  time  i  [I  x^  =  x) .  We  now  use  the  p  processors  to  "simulate" 
the  algorithm.  Since  all  the  operations  in  time  i  can  be  executed  simultaneously, 
they  can  be  computed  by  the  p  processors  in  jx^/pj  units  of  time.  Thus,  the  whole 
algorithm  can  be  implemented  by  p  processors  in  time  of 


\      |x^/p|  <   ^   (x^/p  +  1)  <  |x/p|  +  t 


Remark.  The  proof  of  Brent's  theorem  poses  two  implementation  problems.  The 
first  is  to  evaluate  x^^  at  the  beginning  of  time  i  in  the  algorithm.  The  second 
is  to  assign  the  processors  to  their  jobs. 

III.  Analysis  of  the  text. 

The  algorithm  has  three  steps.   In  the  first  step  an  analysis  of  the   pattern 
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is  performed.  This  analysis  is  used  in  the  second  step  Co  find  a  sparse  set  of 
"suspicious"  indices  of  the  text.  By  suspicious  indices  we  mean  indices  of  the 
text  in  which  occurrences  of  the  pattern  may  start.  The  last  step  applies  a 
character  by  character  check  to  find  in  which  of  the  suspicious  indices  an 
occurrence  of  the  pattern  really  starts.  In  this  section  we  describe  the  last  two 
steps.   The  first  step  is  described  in  the  next  section. 

Definition.     Suppose    that    PATTERN[ j , j+1, . . . ,m]    is   not   a   prefix  of 

PATTERN [ 1, ...,m]  for  some  j,  2  <  j  <  m.  That  is,  there  exists  an  Integer  w,  1  <  w 

<   m-j+1,   such  that  PATTERN(w)  4=  PATTERN(  ( j-1  )+w).   We  say  that  w  is  a  witness  to 

this  mismatch.  Observe  that  w  is  a  witness  against  the  existence  of  a  period  of 
size  j-1  in  PATTERN. 

Output  of  Step  1.   For  each  j,  2  <  j  <  [m/2]  +  1,  Step  1  determines  whether 

PATTERN  has  a  period  of  size  j-1  (WITNESS(j)  will  be  0)  and  computes  a  witness  if 
not  (it  assigns  such  a  witness  to  WITNESS(j)  ). 

Steps  2  and  3 

For  k  >  0,  the  set  of  k -blocks  is 
{TEXT  [  1,...,  2*^  ],  ....TEXT  [il2'^+l,...,(Ji +1)2*^  ],...}.  Steps  2  and  3  depend 
considerably  on  whether  the  pattern  is  periodic. 

Case  1.   The  pattern  is  not  periodic. 

Step  2. 
Initialize.   for  all  i,  1  <  i  <  n-m+1  pardo 

MATCH(i)  :=  T 
Recall   that   the   goal   of   our   algorithm   is  that  HATCH(i)  =  T  if  an  only  if  an 
occurrence  of  the  pattern  starts  at  i,  for  any  i,  1  <  i  <  n-m+1. 

Let  us  define  the  k-sparsity  property:  For  each  k-block  at  most  one  value  of  MATCH 
is  T.  Namely,  each  of  MATCH[1,...  ,2^},..., MATCH[Jl  2'*^+!,  ...,(£ +1)2'^]  ,...  contains 
at  mos  one  T. 

The  goal  of  Step  2  is  to  satisfy  ([log  ra]-l )-sparsity.  However,  at  the  end  of 
Step  2  it  will  still  be  possible  that  MATCH(i)  is  T  while  there  is  no  occurrence 
of  PATTERN  that  starts  at  TEXT(i). 

LEFT(a,k)  contains  the  entry  of  the  leftmost  T  in  TEXT  of  k-block  number  a,  1 
<  a  <  |n-m+l/2'^|,  or  an  indication  that  there  is  no  such  T. 
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Let  us  describe  stage  k  of  Step  2.  (The  inpuc  to  stage  k  satisfies 
(k-1 )-sparsity. ) 

Stage  k,  1  <  k  <  [log  m]-l:  Satisfy  k-sparsity. 

The  procedure  given  below  is  performed  in  parallel  for  all  k-blocks.   Let  a  be   an 
integer  satisfying   1  <  a  <  |n-m+l/2^|.   We  describe  the  procedure  for  k-block  a. 
k-block  a  is  the  union  of  two  (k-1 )-blocks :  2a  and  2a-l. 
if_  LEFT  (k-1,  2a)  =  'null' 
then  LEFT (k, a)  : =  LEFT(k-I, 2a-l ) 
else  if  LEFT(k-l,2a-l)  =  'null' 

then  LEFT(k,a)  :=  LEFT(k-l,2a) 
else  see  below. 
(k-1 )-sparsity  implies  that  following  stage  k-1  there  is  at  most  one  index  ji 
in  (k-l)-block  2a  and  at  most  one  index  J2  in  (k-l)-block  2a-l  such  that  MATCH(j^) 
=  MATCH(J2)  =  T.  The  remaining  case  is  where  both  indices  j  j^  and  ^2  exist.  We  use 
the  concept  of  a  duel  (which  was  described  informally  in  the  introduction)  to 
eliminate  one  of  these  T-s  using  information  that  exists  in  WITNESS  following  Step 
1. 

Let  w  be  WITNESS(jj-J2  +  l).  Let  x  =  PATTERN(w),  y  =  PATTERN(  j  ^-J2-Hw)  and  z  = 
TEXT(j^-l+w).  Since  j^  and  J2  belong  to  the  same  k-block,  J1-J2  <  '^^ '  For  k  < 
[log  m]-l,  this  implies  w  +  0  .  w  is  a  witness  that  PATTERN[ j ^-J2+l , . . . ,m]  is  not 
a  prefix  of  PATTERN.   Namely,  x  +  y. 

If  an  occurrence  of  PATTERN  starts  at  j  j^  then  x  =  z. 
If  an  occurrence  of  PATTERN  starts  at  J2  then  y  =  z. 

x  4=  y  implies  that  only  one  of  the  later   two   equalities   can   be   satisfied   and 
therefore   at   most   one  of  these  two  occurrences  may  hold.   We  use  z  to  eliminate 
the  possibilty  of  (at  least)  one  of  these  occurrences: 
if  z  +  y 

then  MATCH(J2)  :=  F 
if_  z  4=  x 
then  MATCH(jj^)  :=  F 

Finally, 

i_f  MATCH(j,)  =  T 
then  LEFT (k, a)  :=  J2 
else  if  MATCH(j^)  =  T 

then  LEFT(k,a)  :=  j^ 
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Complexity.   Stage  k  of  Step  2  needs  0(n/2'^)  operations  and  0(1)  time.   Therefore, 
Step  2  needs  a  total  of  0(n)  operations  and  O(log  n)  time. 

Step  3^. 

For   each  a,      1  <   a   <   n-m+1,   such  that  MATCH  (a  )  =  T  check,  character  by 
character,  if  an  occurrence  of  the  pattern  starts  at  a. 
fo£  all  j,  1  <  j  <  |(n-m+l)/2tlog  ™J"M.  pardo 
for  all  i ,  1  <  i  <  m,  pardo 

(Denote  t(j)  =  LEFT([log  m]-l,j)  ) 

if  t(j)  +  'null' 

then  if  TEXT(t ( j )+i-l)  ^    PATTERN(i) 

then  MATCH(t(j))  :=  F  (simultaneous  writes  are  possible) 

This  results  in  MATCH(i)  =  T  (for  any  i,  1   <   i  <   n-m+1)   if   and   only   if   an 
occurrence  of  the  pattern  starts  at  location  i  of  the  text  as  we  wanted. 
Complexity.   0(ran/2tlog  ml-l)  =  o(n)  operations  and  0(1)  time. 

Case  2.   The  pattern  is  periodic. 

Say  that  the  pattern  is  u^v  where  u  is  the  period  of  the  pattern,  |u|=P,  and 
lv|<P  and  let  Q=lv|=ra-sP  (<P). 

Step  2.  1  Rerun  Step  I  for  PATTERN [ 1 ,..., 2P]  instead  of  the  whole  pattern. 

Step  2.2  Perform  [log  P]  rounds  of  duels  with  respect  to  the  text  (similar  to 
Step  2  of  Case  1  above).  As  a  result  each  [log  Pj-block  of  the  text  will  have  at 
most  one  index,  where  an  occurrence  of  the  period  u  may  start  (to  be  called  a 
suspicious  index,  'as  before).  Observe  that  since  the  information  in  WITNESS  is 
based  now  only  on  u",  every  index  of  the  text  in  which  an  occurrence  of  u  starts 
is  suspicious. 

Step  3.1.  For  every  suspicious  index  check,  character  by  character,  if  an 
occurrence  of  u  v  starts  at  it  (similar  to  Step  3  of  Case  1  above). 

Steps  2.1,  2.2  and  3.1  result  in  the  following:  for  every  i,  1  <   i  <   n-2P-Q+l, 
MATCH(i)=T  if  and  only  if  there  is  an  occurrence  of  u  v  at  i. 
These  steps  need  a  total  of  0(n)  operations  and  O(log  n)  time. 

Step   3. 2.    Our   present   goal  is  to  find  for  each  such  i  the  maximum  k  such 


that  an  occurrence  of  u'^v  starts  at  i.  Then,  If  k  >  s  we  conclude  that  an 
occurrence  of  the  pattern  starts  at  i.  For  completeness  of  the  presentation  we 
bring  below  the  slightly  tedious  implementation  of  Step  3.2. 

We  will  use  a  standard  balanced  binary  tree  with  n-2P-Q+l  leaves  to  guide  the 
computation.  Denote  3  =  n-2P-Q+l.  Each  node  of  the  tree  is  a  pair  (x,y),  where  x 
is  the  level  of  the  node  in  the  tree  and  y  is  its  serial  number  among  other  nodes 
of  the  same  level.  The  leaves  of  the  tree  are  (0,1),...,  (0,2|l°g^l).  A  node 
(x,y)  of  the  tree  is  the  father  of  two  nodes:  (x-l,2y-l),  its  left  son,  and 
(x-I,2y),  its  right  son.  For  each  i,  1  <  1  <  6,  such  that  MATCH(i)=T,  we  compute 
below  into  LARGEST(i)  the  largest  index  I  such  that  MATCH (Jl  )=T  and 
PATTERN  [i,...,£-l]=u(^"^)/^,  where  (£-i)/P  is  an  integer.  An  addition  of  two  to 
(£-i)/P  will  yield  the  maximum  k  as  required. 

Serially,  this  i    can  be  easily  computed  in  linear  time  by  scanning  the   text   from 
right   to   left.    Our   parallel   implementation   uses  auxiliary  arrays  A[i,j]  and 
B[i,j]  whose  entries  correspond  to  nodes  of  the  binary  tree. 
Initialization. 
for  all  i ,  I  <  i  <  2  U°g  ^  I ,  pardo 

if_  MATCH(i)=T  and  MATCH(i+P)=F 

(Comment .   In  case  the  if  condition  is  satisfied  the  maximum  I    for  i  is  1 
itself) 

then  A(0,i)  :=  1 

else  A(0,i  )  :  =  «> 
The  computation  has  2|log  B|  stages.    Each   of   the   first   |  log  B|   stages 
consists   of   moving  one  level  up  the  tree,  starting  from  the  leaves  and  ending  at 
the  root.   They   result   in   each  A(x,y)   having   the   minimum  A(0,i)   over   its 
leaf-descendents. 
Stage  £j_  £  j^  J_  ^  |  log  g  |. 

fo£an_i,  1<  i<  2U°g  ^  l~'^  pardo 

A(r,i)  :=  min(A(r-l, 2i-l ) ,A(r-l , 2i ) ) 
Each  of  the  last  | log  B|  stages  consists  of  moving  one  level  down  the  tree, 
starting  at  the  root  and  ending  at  the  leaves.  The  goal  in  these  stages  is  to 
compute  into  B(0,i),  1  <  i  <  g,  the  smallest  j  for  which  the  if  condition  above  is 
satisfied  and  j  >  i.  It  can  be  readily  verified  by  decreasing  induction  on  the 
level  r  that  B(r,i)  has  the  smallest  j  for  which  the  if  condition  is  satisfied 
such  that  j  >  12"^. 
Set,  B(|log6|,l)  :="  7~ 
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Stage  2  I  log  g  l+l-r,  £  _^  |log  B  |  downto  _1_. 
foraUi,  1<  i<  2  U°g  ^  J"*^  pardo 
B(r-l,2i)  :=  B(r,i) 
£f_  A(r-l,21)  <  « 
then  B(r-l,2i-l)  :=A(r-l,2i) 
else  B(r-l,2i-l)  :=  B(r,i) 
In  order  Co  complete  the  computation  of  LARGEST  perform: 
for  all  i,  I  <    i  <  n,  pardo 

if  MATCH(1)=T  and  MATCH(H-P)=F 
then  LARGEST (i)  :=  i 
else  LARGEST (i)  :=B(0,i) 
It  is  straightforward  to  see  that  for  every  i,  1  <  i  <  B,  such  that   MATCH(i) 
=  T,  LARGEST(i)  has  the  desired  value.   Finally, 
for  alj^  1,  1  <  i  <  6,  such  that  MATCH(i)=T  pardo 
k  :=  (i-LARGEST(i))/P  +  2 
if_  k  <  s 

then  MATCH(i)  :=  F 
Complexity.  The  number  of  operation  required  by  Step  3.2  is  proportional  to  the 
number  of  nodes  in  the  binary  tree  (0(0))  and  the  time  is  proportional  to  its 
height  (O(log  6)).  So,  steps  2  and  3  of  Case  2  need  also  a  total  of  0(n) 
operations  and  O(log  n)  time.  Apply  Brent's  theorem  to  get  a  bound  of  O(log  n) 
time  using  n/log  n  processors  for  both  case  1  and  2.  The  reader  is  invited  to 
verify  that  here  and  throughout  the  rest  of  the  algorithm  the  implementation 
problems  in  the  remark  following  Brent's  theorem  can  be  readily  overcome. 

IV.   Step  1  -  Analysis  of  the  pattern. 

The  pattern  is  the  input  for  Step  1.  Step  1  consists  of  manipulating  the 
array  WITNESS,  whose  length  is  m.  Recall  that  in  the  previous  section  we  already 
specified  what  WITNESS  must  include  following  Step  I.  It  is  initialized  as 
follows. 

for  all  j ,  1  <  j  <  m  pardo 

WITNESS(j)  :=  0  (Interpretation.   PATTERN[ j , j+1 , . . . ,ra]  is 
"suspected"  to  be  a  prefix  of  the  pattern.) 
In   this   section,   the   set   of   k-blocks   refers   to    the   pattern.     It    is 
{ PATTERN  [  1 ,  . . .  ,  2^  ] ,  . . .  ,  PATTERN  [Jl  2'^+ 1  ,...,(£  +1 )  2^  ],...} . 

Step   1   consists   of  [log  m]-2  or  [log  m]-3  iterations  (called  stages)  and  a 
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Cerminal  stage.  Later  In  this  section  we  describe  the  terminal  stage  and  how  to 
determine  the  exact  number  of  Iterations  to  be  performed.  Following  stage  k,  the 
following  three  properties  are  satisfied. 

(1)  The  k-certainty  property.  For  j,  1  <  j  <  2^,  WITNESS(j)  =  0  if  and  only  if  an 
occurrence  of  PATTERN  [  1 ,...,  z'^'^^-j  +  I  ]  starts  at  PATTERN(j).  That  is,  for  1  <  j  < 
2  ,  WITNESS(j)  =  0  indicates  that  we  are  certain  that  there  is  such  an  occurrence 
at  j.  The  k-certatinty  property  can  alternatively  be  presented  as  follows. 
Imagine  that  PATTERN[  1, . . .  ,  2^"^^  was  the  whole  pattern.  Then,  WITNESS  [I.  ...,2'^] 
has  its  final  values  as  required  by  the  output  definition  of  Step  I. 
Obviously,  WITNESS(I)  must  be  always  zero. 

(2)  The  k-sparsity  property.  (In  this  section  it  will  apply  to  the  pattern).  If 
WITNESS  [2,  ...,  2*^]  does  not  have  any  zero  then  WITNESS  of  each  k-block  has  at  most 
one  zero.  (That  is,  each  of 
WITNESS  [1,...,2'^  ],..., WITNESS  [£2l^+l,...,(£+l)2'^+h,...  contains  at  most  one 
zero). 

(3)  The  k-lookahead  property.   WITNESS(i)  <  2'^'*''^  for  every  index  i  of  the  pattern. 

Satisfying  the  k-certainty  and  the  k-sparsity  properties  is  a  fairly 
intuitive  goal,  while  satisfying  the  k-lookahead  property  may  seem 
counter-intuitive.  Paricularly,  since  satisfying  it  implied  in  several  places  not 
using  available  information  which  seemed  as  if  it  will  speed  up  the  algorithm. 
Therefore,  our  presentation  focuses  on  satisfying  the  first  two  properties 
prove  later  that  the  k-lookahead  property  is  satisfied  as  well  (in  Lemma  1). 


We 


We  describe  now  stage  k+1  of  Step  1.  We  follow  closely  the  illustrative 
description  which  is  given  in  Fig.   2. 

After   stage  k  we   must   be  at  either  the  arrow  leading  to  Box  2  or  at  the  arrow 
leading  to  Box  4,.  In  either  case  "k-certainty"  is  satisfied. 

"k-sparsity"  is  satisfied  when  we  enter  Box  2.  k-sparsity  need  not  be 
satisfied  at  a  periodic  mode  (i.e.,  if  we  were  at  Box  4  in  stage  k  and  proceeded 
to  stage  k+l  at  Box  4). 

LEFT(a,k)  relates  in  this  section  to  the  pattern.  It  contains  the  entry  of 
the  leftmost  zero  in  WITNESS  of  k-block  number  a,  1  <  a  <  |m/2^|,  or  an  indication 
that  there  is  no  such  zero.  If  PATTERN  [  1, ...,  2^"^ M  has  a  period  of  size  <  2^^-! 
then  PERIODICITY(k)  contains  the  period  size. 
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LeC   us   specify  the  Instructions  In  each  of  the  boxes.   The  instructions  for 
boxes  2-5  assume  that  they  are  activated  in  stage  k+1. 
Box  1.   (Start) 
for  all  j ,  I  <  j  <  m  pardo 

WITNESS(j)  :=  0 
If  PATTERN(l)  4=  PATTERN(2) 
then  start  stage  1  at  Box  2 
else  start  stage  1  at  Box  4  (enter  a  periodic  mode) 

Box  1.      (Upon  entering  Box   2,   k-sparsity   and   k-certainty  are   satisfied. 
WITNESS!  2,...,  2^^]  has  no  zeros.) 

If  suspected  periodicity  has  been  found  start  stage  k+2  at  a  periodic  mode 
(Box  4).   Otherwise,  progress  to  Box  3.  Specifically, 

tf_  LEFT(2,k)  *  'null'  (i.e.,  does  WITNESS  of  k-block  number  2  has  a 

zero?  Note  that  k-sparsity  implies  that  there  is  at  most  one  such  zero) 
then  (let  x  =  LEFT(2,k)) 

for  all  j,  1  <  j  <  2'^'^2_j^+l  pardo 
i^  PATTERN(j)  +  PATTERN ( x- 1 +j ) 

t_hen  WITNESS  (x)  :=  j  (Note  that  the  _if_  statement  condition  may  hold  for 
several  j-s.   This  would  result  in  simultaneous  writes  into  WITNESS(x)) 
_if_  WITNESS(x)  =  0  (i.e.,  the  condition  did  not  hold  for  any  j) 
then  PERIODICITY(k+l)  :=  x-1  ;  Start  stage  k+2  at  Box  4 
Proceed  to  Box  3  [the  situation  is  that  WITNESS  [2,  . . . ,  2*^"^^  has 
no  zeros] 

Box  3.   (k-sparsity  and  (k+1 )-certainty  are  satisfied.   For  every   2  <   j  < 
2k+l^  WITNESS(j)  ^   0.) 

Satisfy   (k+1 )-sparsity .    The  procedure  given  below  is  performed  in  parallel 
for  all  (k+l)-blocks.   Let  a  be  an  integer   satisfying   2  <   a  <   |  m/2'*^''"^  |  .    We 
describe   the   instructions   for   (k+l)-block  a.  (k+l)-block  a  is  the  union  of  two 
k-blocks:  2a  and  2a-l. 
lf_  LEFT(k,2a)  =  'null- 
then  LEFT  (k+1,  a)  :=  LEFT(k,2a-l) 
else  if  LEFT(k.2a-l)  =  'null' 

then  LEFT (k+1, a)  :=LEFT(k,2a) 
else  see  below. 
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k-sparsity  implies  that  there  is  at  most  one  index  j  i  in  k-block  2a  and  at 
most  one  index  J2  in  k-block  2a-l  such  that  WITNESS(j^)  =  WITNESS(J2)  =  0.  The 
remaining  case  is  that  indices  j  j^  and  ji  exist.  Here  enters  again  the  concept  of 
a  duel.  We  perform  a  duel  between  these  indices  in  which  one  of  these  zeros  will 
be  eliminated,  similar  to  the  previous  section.  Let  w  =  WITNESS( j , -j^+l ),  x  = 
PATTERN(w),  y  =  PATTERN( j ^-J2+w)  and  z    =  PATTERN( j ^-1+w). 

( Implementation  Remark  1.  In  the  present  description,  we  ignore  the  case  where 
jj^-l+w  >  m  (or  when  there  is  reference  to  an  index  of  the  pattern  which  is  >m). 
The  algorithm  proceeds  as  if  PATTERN( j ^-1+w)  matches  any  possible  character. 
However,  the  k-lookahead  property  prevents  this  case  from  affecting  the 
correctness  of  the  algorithm  as  explained  in  the  presentation  of  the  terminal 
stage  of  Step  1  later.) 
We  use  z  to  eliminate  (at  least)  one  of  the  zeros  at  ji  and  ^2' 

IJ_  z  +  y 

then  WITNESS(J2)  :=Ji-J2+w 

if_  z  +  X 

then  WITNESS(j^)  :=  w 

Finally, 

i£  WITNESS(J2)  =  0 
then  LEFT(k+l,a)  :=  jt 
else  if  WITNESS(j^)  =  0 

then  LEFT (k+1, a)  : =  j ^ 
else  LEFT(k+I,a)  : =  'null' 

Box  4.  Periodic  mode.  (Recall  that  we  are  presently  describing  stage  k+I). 
(Say  that  the  last- transition  from  Box  2  or  5  occurred  at  stage  k,+l.  k-certainty 
and  k^-sparsity  are  satisfied.  Say  the  period  size  of  PATTERN [ 1 ,..., 2^"^ M  (the 
suspected  periodicity)  is  P). 

We  pick  indices  j  of  the  first  (k+l)-block  such  that  WITNESS(j)  =  0  and  j-1 
is  not  divisible  by  P.  The  fact  that  k-certainty  was  satisfied  upon  entering  Box 
4,  implies  that  each  such  j  must  belong  to  the  second  k-block.  For  each  such 
index  j,  we  select  the  index  i,  such  that  j-P  <  i  <  j  and  i-1  is  divisible  by  P 
and  perform  a  "one  way"  duel  between  i  and  j  in  which  only  an  assignment  into 
WITNESS(j)  can  be  performed.  Explicitly  1  =  [ ( j-1 )/P ]P  +  1 .  As  we  see  below,  it  is 
very  useful  that  j-i  <  P. 
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for  all  j ,  2^^  <  j  <  2^-^^    pardo 

if_  WITNESS (j)=0  and  j  mod  P  +  1 

then  (Let  w  =  WITNESS( ( j-1 )  mod  P) 
if  PATTERN (j-l+w)  +  PATTERN (w) 
then  WITNESS(j)  :=  w 
(Explanation.   We  prove  later  (Lemma  1)  that  after  stage  t,  WITNESS(£)<  2^'^^,    for 
every  index  £  .   Therefore,  if  P  is  the  period  of  PATTERN[  1 , . .  . ,  2^"^^  ]  then  for   all 
the   j   indices  that  satisfy  the  first  if  condition  above,  the  second  if  condition 
must  be  satified  as  well.   Hence,  this  instruction  will  result  in  WITNESS(j)  +  0. 
Claim  1.   Suppose  P  is  not  a  period  of  PATTERN [ 1, ..., 2^"^^ ] .   Then  for  at  most  four 
indices  j  that  satisfied  the  first  if  condition  WITNESS(j)  remains  0.  To  show  this 
we  need  the  following. 

Claim  2.  WITNESS(i)  <  2^^°^  P]+2^  ^^^  ^^^^  2  <  i  <  P.  Proof.  Apply  Lemma  1  and 
the  fact  that  all  these  indices  of  WITNESS  were  updated  before  we  enterd  Box  A  at 
stage  [log  P]+l  when  P  became  the  suspected  periodicity.  We  can  conclude  from  the 
proof  the  following. 

Corollary  1.   ([log  P])-sparsity  is  satisfied.   From  Claim  2  we  can  conclude  that, 
Corollary   2_.    For   each   index   j  <  2^'^^-2^^°^   ?]+2^    ^^^^    satisfied  the  first  if_ 
condition,  WITNESS(j)  +  0. 

Proof  of  Claim  1.  By  Corollary  2,  only  indices  j,  2^'*'^-2^^°S  ^^'^^  <  j  <  2'^"^^ 
that  satisfied  the  first  if  condition  may  have  WITNESS(j)  =  0.  These  indices  may 
be  included  in  at  most  four  ([log  P])-blocks.  Corollary  1  implies,  that  WITNESS 
of  at  most  one  index  in  each  of  these  blocks  has  a  zero,   j 

Check  whether  the  periodicity  continues  until  index  2^        of  the  pattern. 
If  yes,  start  stage  k+2  at  Box  4.  (Observe  that  (k+1 )-certalnty  is  satisfied). 
Suppose   the   periodicity   does   not   continue   until   index   2    .    Consider  the 
possibilities   that   any   multiple   of   P,   which   is   <   2    ,   is   a   period   of 
PATTERN  [  1 ,...,  2^''"- ] .    The   character   of   the  pattern  which  caused  the  assignment 
into  WITNESS(P+1)  is  also  a  counter  example  to  any  of  these  possibilities.   Update 
this   into  WITNESS.   As  a  result  WITNESS  [2,  ...,  2'^'''M  will  have  at  most  four  zeros 
whose  indices  are  >  2   (Claim  1).   Check,  character  by  character,  if  any  of   these 
zeros  represents  a  period  of  PATTERN  [  1 ,...,  2^ ''■^  ]  and  update  WITNESS  appropriately. 
Obviously  at  most  one  of  this  zeros  represents   a   period   of   PATTERN [ 1 ,..., 2^   ] 
(Proposition  2).   Proceed  to  Box  5.  (Observe  that  (k+1 )-certalnty  is  satisfied), 
for  all^  j ,  2^+^  <  j  <  2l^+2  pa^jp 
if  PATTERN(j)  +  PATTERN (j  mod  P) 
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Chen  WITNESS (P+l)  :=  j-P  (Simultaneous  writes  are  possible) 
if  WITNESS(P+1)  =  0 

(In  words:  Is  P  still  the  suspected  periodicity?) 
then  start  stage  k.+2  at  Box  4 
else   for  aU  j,  2  <  j  <  (2^'*'^-l)/?   pardo 

WITNESS(jP+i)  :=  WITNESS(P+1)  -  (j-l)P 

(Explanation.   Observe  that  for  each  of  these  j-s,   jP  +  WITNESS( jP+1 ) 
is  the  same.   As  we  said  before,  this  means  that  the  same  character 
of  the  pattern  contradicts  periods  of  sizes  jP,  for  each  of  these  j-s) 
for  each  i ,  2*^  <  i  <  2'^'^^  such  that  WITNESS(i)  =  0  do_ 
(there  are  at  most  four  such  i-s) 
for  all  j,  1  <  j  <  2'^"*'2-i  +  l  pardo 
If  PATTERN(j)  +  PATTERN (i-l+j) 

then  WITNESS(i)  :=  j  (simultaneous  writes  are  possible) 
i£WITNESS(i)  =  0  (i.e.,  the  condition  did  not  hold  for  any  j) 
then  PERIODICITY(k.+  l)  :=  i-1  ; 
proceed  to  Box  5 

Box  5.  ((k+l)-certainty  is  satisfied.  WITNESS [2, . . . , 2^ ]  has  no  zeros, 
k-j^-sparsity  is  satisfied). 

Satisfy  k-sparsity.  This  is  done  in  k-kj^  iterations.  In  iteration  t,  1  <  t 
<  k-kp  (k|+t  )-sparsity  is  satisfied.  Each  Iteration  is  similar  to  the  way  in 
which  (k+L )-sparsity  is  satisfied  in  Box  3.  The  details  are  left  to  the  reader  - 
no  new  ideas  are  required.  If  WITNESS  [2^^+! ,...,  2^^"^^  has  a  zero  (PERIODICITY(k+l ) 
+  'null')  then  start  stage  k+2  at  Box  4.  Otherwise,  proceed  to  satisfy 
(k+L )-sparsity  at  Box  3. 

Next,  we  give  a  complexity  analysis  of  the  stages  described  above.  Later, 
the  terminal  stage  of  Step  1  is  presented.  For  reasons  of  clarity  the  main  points 
required  for  a  correntness  proof  of  Step  1  will  be  combined  into  the  presentation 
of  the  terminal  stage. 

Complexity  analysis.  Stage  k:  Each  of  boxes  2,3  and  4  is  visited  at  most 
once  in  each  stage.  Box  2  needs  0(2*^)  operations  and  0(1)  time.  Box  3  needs  0(1) 
operations  and  0(1)  time  per  each  of  the  <  |ra/2  |  k-blocks  in  order  to  satisfy 
k-sparsity.   Box  4  needs  0(2^)  operations  and  0(1)  time.   Since  k  increases  from  1 
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Co  <  [log  m]-2,  we  have  so  far  0(m)  operations  and  O(log  m)  time.  Box  5:  For  each 
i,  we  satisfy  i-sparsity  at  most  once  during  these  stages.  As  in  Box  3  satisfying 
i-sparsity  needs  0(m/2^)  operations  and  0(1)  time,  and  the  same  total  bound  of 
0(m)  operations  and  O(log  m)  time  applies. 

Apply  Brent's  theorem  to  get  O(log  m)  parallel  running  time  using  m/log  m 
processors . 

The  terminal  stage  and  correctness  of  Step  1. 

Recall  that  our  goal  is  to  determine  WITNESS(j),  2  <  j  <  [m/2]  +  1.  The  only 
problem  that  may  arise  in  arguing  that  Step  1  achieves  this  goal  relates  to 
Implementation  Remark  1  in  Box  3.  There,  we  describe  a  situation  where  the 
information  in  WITNESS  implies  a  comparison  with  a  character  of  the  pattern  whose 
index  is  >  m.  By  Implementation  Remark  1  the  outcome  of  such  a  comparison  would 
not  affect  the  values  in  WITNESS.  In  order  to  be  able  to  proceed  in  this 
discussion  we  need  the  following  lemma. 

Lemma  1  (the  k-lookahead  property).  Following  stage  k,  WITNESS(i)  <  2^"^^  for 
every  index  i  of  the  pattern. 

Proof .  By  induction  on  k.  For  k=0  (before  stage  1)  the  lemma  readily  holds. 
We  assume  the  lemma  holds  for  k  and  show  it  holds  for  k+1.  Let  us  check  all 
instructions  of  stage  k+1  in  which  an  assignment  into  WITNESS(i)  can  be  performed. 
The  order  in  which  boxes  are  visited  in  stage  k+1  is  first  boxes  2  or  4  and  'then 
boxes  5  and  3  (several  of  the  boxes  may  not  be  visited  at  all  during  this  stage). 
Observation.  All  assignments  into  WITNESS(i)  in  boxes  2  and  4  satisfy 
i-l+^^ITNESS(i )  <  2'^"''^.  Let  us  prove  this  observation.  In  box  2  there  is  only  one 
instruction  in  which  assignment  into  WITNESS(i)  may  be  performed.  In  this 
assignment,  i  <  2'^"''  ,  and  the  number  assigned  is  <  2*^  -i  +  1.  There  are  three 
instructions  in  Box  4  in  which  assignments  into  WITNESS(i)  may  be  performed.  In 
the  first  assignment  i  <  2  and  WITNESS(i)  is  assigned  a  value  already  in 
WITNESS(j)  for  some  index  j,  which  was  computed  in  a  previous  stage.  By  the 
inductive  hypothesis,  this  assignment  is  <  2  and  therefore,  i-l+WITNESS(i )  < 
2  .  In  the  second  and  third  assignments,  i  <  2*^  *•  and  the  numbers  being 
assigned  are  <  2'^''''^-i  +  l .  This  completes  the  proof  of  the  Observation.  In  each 
iteration  of  Box  5  and  in  Box  3,  there  are  two  assignments  into  WITNESS(i),  where 
i  >  2^"''^.  One  is  of  the  form  j-l+WITNESS(  j )  and  the  other  is  of  the  form 
WITNESS(j).  In  both  assignments  j  <  2*^"^^.  The  fact  that  the  ranges  of  i  and  j 
above  do  not  ovelap  implies  that  an  assignment  into  WITNESS(l)  in  Box  3  or  in  any 
iteration  of  Box  5  can  not  be  affected  by   an   assignment   into  WITNESS(j)   in   a 


-16- 

prevlous  iteration  of  Box  5  at  stage  k+l.  Let  us  take  a  closer  look  at  the  more 
potentially  problematic  assignment  in  Boxes  3  and  5.  Namely  the  one  of  the  form 
j-l-WITNESS(j).  We  have  to  show  that  j-l-WITNESS(  j )  <  2^+2^  jj  WITNESS(j) 
received  its  value  before  stage  k+l  then  this  is  implied  by  the  inductive 
hypothesis.  If  WITNESS(j)  recieved  its  value  in  boxes  2  or  4  of  stage  k+l  then 
this  is  implied  by  the  Observation.  We  conclude  that  following  stage  k+l, 
WITNESS(i)  must  be  <  2^'*"2  for  every  index  i.  • 

How  does  the  algorithm  determine  whether  to  perform  [log  m]-2  or  [log  m]-3 
stages?  Recall  that  we  are  intereseted  only  in  entries  of  WITNESS,  which  are  < 
[m/2]+l.  Let  i  be  an  index  of  the  pattern,  such  that  1  <  i  <  [m/2]+l.  Lemma  1 
implies  that  in  the  first  k  stages,  there  is  no  reference  to  an  index  of  the 
pattern  which  is  >  i+2'^'''  .  The  idea  will  be  to  run  the  algorithm  as  long  as  there 
is  a  k+l  block  (which  is  of  size  2^  )  that  can  serve  as  a  "buffer"  between 
[m/2]+l  and  the  end  of  the  pattern.  Specifically,  we  will  be  looking  for  the 
maximum  k  for  which  there  exists  a  (k+l)-block  such  that  all  its  entries  are  <  m 
and  >  [m/2]+l.   The  situation  is  illustrated  in  Fig.   3. 

Case  1.   m  >  3.2^108  "l'^. 
Here   [2t^og   ™]+l , . . . , 3. 2 ^^^8  m]-i^      (i.e.,   ([log   m]-l)-block  number  3)  is  the 
buffer.   We  can  perform  [log  m]-2  stages   with   this   buffer   protecting  us   from 
referencing   any   index   >ra   from  the  first  four  ([log  m]-2)-blocks  (which  include 
[m/2]+l). 

Case  2_.   m  <  3.2tl°g  ™]"^ 
Here  [  3.  2  [l°g  "1  "^+1 ,  . . . ,  2  [l°g  "'^  ]  (i.e.,   ([log  ra]-2)-block  number   4)   is   the 
buffer.    We   can   perform   [log   ra]-3   stages  with  this  buffer  protecting  us  from 
referencing  any  index  >ra  from  the  first   six   ([log   ra]-3)-blocks   (which   include 
[m/2]+l). 

Let  us  describe  the  terminal  stage  for  Case  1.  In  a  few  of  the  subcases 
considered  the  terminal  stage  will  determine  WITNESS(i)  for  the  first  four  ([log 
m]-2)-blocks .  In  other  subcases  WITNESS(i)  will  be  determined  for  i  <  [m/2]+l 
only.  After  stage  [log  m]-2,  ([log  m]-2)-certainty  is  satisfied.  (It  is  easy  to 
prove  this  by  induction  on  the  number  of  stages  using  Lemma  1.)  Case  1  breaks  into 
two  subcases. 

Case  1.1.^  PATTERN[1,  ...,2^^°S  m]-l  ]  ^^^^  ^^^^  ^lave  a  period  of  size  <  2^^°^ 
™^~  -1.  (That  is,  if  there  had  been  stage  [log  m]-l,  it  would  have  started  at  Box 
2).   WITNESS[2,...,2tl°g  ™1"2]   ^^^g   ^ot-  ^^^^      ^^y      zero.    Each  one   of   ([log 
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m]-2)-block.s  2,3  and  4  may  have  at  mosC  a  single  zero.  (Lemma  1  and  induction  on 
the  number  of  stages  are,  again,  all  that  required  to  prove  this.  This  was 
referred  to  as  ([log  m]-2)-sparsity  earlier). 

The  terminal  stage.  For  each  of  these  three  possible  zeros,  check  in  a  character 
by  character  fashion  if  they  stand  for  a  period  of  the  pattern.  If  not,  update 
WITNESS  using  (possibly)  simultaneous  writes  into  the  same  memory  locations. 

Case   1.2.   PATTERN  [  1,  ...,  2  t^°g  ^]-^]    has  a  period  whose  size  is  P  <  l^^^S.   n>]-2-i. 
(That  is,  if  there  had  been  stage  [log  m]-l,  it  would  have  started  at  Box  4). 
The  terminal  stage. 

for  all  j,  2t^°g  ™]"2  <  j  <  ^  pardo 

(Comment.   Check,  character  by  character,  if  P  is  the 
periodicity  of  the  whole  pattern  (similar  to  Box  4). 
if  PATTERN(j)  +  PATTERN (j  mod  p) 
_then  WITNESS (P  +  1)  :=  j-P  (Recall  that  we  use  the  convention  that  if 

several  processors  attempt  to  write,  then  the  one  with  the  smallest 
j  succeeds) 
if  WITNESS(P+1)  +  0 

(In  words:  Did  we  stop  considering  P  to  be  the  suspected  periodicity?) 
then   for  all  j,  2  <  j  <  (P  +  WITNESS(P+1 ) )/P  pardo 
WITNESS(jP+l)  :=  WITNESS(P+1)  -  (j-l)P 

[ Explanation.   As  in  Box  4,  Proposition  1  precludes  the  possibility 
that  these  multiples  of  P  are  sizes  of  periods  of  the  pattern. 
It  is  not  difficult  to  see  that  the  same  character  of  the  pattern 
(at  location  WITNESS(P+1 )+P)  witnesses  against  each  of  these  periods. 
We  use  later  the  fact  that  if  WITNESS(P+1 )+P  >  [m/2]+l,  then  these 
assignments  result  in  WITNESS(i)  4=  0  for  all  i,  2  <  i  <  [m/2]  +  l, 
of  the  form  jP+1.) 
i£  WITNESS(P+1)+P  <  [ra/2]+l 

then  satisfy  ([log  m]-2 )-sparsity  in  [log  m]-2-k|  iterations 
(similar  to  Box  5.  k^+1  is  the  stage  in  which  the  transition  into 
the  present  periodic  mode  was  performed.   Unlike  Box  5,  we 
operate  here  on  ([log  ra]-2)-block  number  2,  as  well.) 
Explanation.   The  last  _l_f_  statement  treats  the  case  where  P  fails  to  be   a   period 
of   PATTERN [ 1, ..., [m/2]+l ].   The  satisfaction  of  the  if_  condition  implies  that  the 
assignments   into  WITNESS (jP+I )   earlier   in    the    terminal   stage    satisfy 


-18- 

WITNESS(jP  +  l)+jP  <■  [m/2]+l.  Similar  considerations  to  the  proof  of  Lemma  1,  imply 
that,   as  a   result  of   the   iterations   of   the   last   instruction,   each   of 

WITNESS[2fl°g  ^^-^+i 2.2tl°g™l-2]  .    WITNESS [2. 2 ^^^g  "^^'^+1, . . . ,3.2^^°^   "]-2] 

(  ([log  m]-2)-blocks  2  and  3)  and  WITNESS [3. 2^108  "^I'^+l, . . . , [m/2 ]+l ]  has  at  most 
one  zero.   Finally, 

Check  in  a  character  by  character  fashion  whether  these  zeros  should  remain  and, 
if  not  update  WITNESS. 

Next,  we  deal  with  the  two  remaining  cases:  P  is  a  period  of  the  whole  pattern  or 
P  is  a  period  of  PATTERN [ 1, ..., [m/2 ]+l ]  but  not  of  the  whole  pattern. 

if  WITNESS (P+1)=0  or  WITNESS(P+1 )+P  >  [m/2]+l 

then  for  all  j,  2^^°^   "^'^  <  j  <  [m/2]+l  pardo 
i£  WITNESS (j)=0  and  j  mod  P  +  I  and 

PATTERN(  WITNESS  ((j-1)  mod  P  )]  *  PATTERN(  j-l+WITNESS(  ( j-1  )  mod  P  )] 
then  WITNESS(j)  : =  WITNESS( ( j-1 )  mod  P) 
( Explanation.    If  P  is  the  period  of  the  whole  pattern  (WITNESS(P+1 )=0)  then  this 
instruction  would  guarantee  that  if  WITNESS (i )=0,  1  <  i  <  [m/2]+l,  then  it   is   of 
the  form  jP+1. 

If  WITNESS(P+1)+P  >  [m/2]+l,  we  already  argued  that  WITNESS(i)  +  0,  for  all  i,  2  < 
i  <  [m/2]+l,  of  the  form  jP+1.  The  last  instruction  updates  indices  i  which  are 
not  of  this  form  and  results  in  the  following. 

Claim  2-    fc^  ^^    ™ost  five  indices  i,  2  <  i  <  [m/2]+l,  WITNESS(i )=0.   Similar  to 
the  proof  of  Claim  1  we  use  Claim  2.  Claim  2  will  have  the  following  corollaries. 
Corollary  3.   ([log  P])-sparsity  is  satisfied  for  the  blocks  that  cover  indices  < 
[m/2]+l. 

Corollary  4.   For  all  indices  i  <  [m/2 ]+l-2 ^^°g  P]+2^    WITNESS(i)  +  0. 
Proof   of_  Claim  3_.        By   Corollary   4,   only  indices  i,  [m/2  ]+l-2  ^  ^°g  ^1"^^  <  i  < 
[m/2]+l,  may  have  WITNESS(i)  =  0.  These  indices  may  be  included  in   at   most   five 
([log   P])-blocks.   Corollary  3  implies,  that  WITNESS  of  at  most  one  Index  in  each 
of  these  blocks  has  a  zero.   J 
Finally, 

Check  in  a  character  by  character  fashion  whether  these  zeros  should  remain  and, 
if  not  update  WITNESS. 

It   is  easy  to  verify  that  Case  1  of  the  terminal  stage  needs  0(m)  operations 
and  O(log  m)  time. 

Case  2. 


-19- 

No  new  ideas  are  required  Co  resolve  this  case  within  the  same  complexity 
efficiencies. 

Complexity  of  Step  I.   Step  1  requires  0(m/p)  time  using  p  <  m/log  m  processors. 

How  important  is  the  model  of  parallel  computation?. 

The  strongest  concurrent-write  model  of  parallel  computation  considered  in 
this  paper  uses  the  following  convention.  Suppose  that  several  processors  attempt 
to  write  simultaneously  at  the  same  memory  location.  Then  the  lowest  serial 
numbered  among  the  trying  processors  succeeds.  In  a  weaker  concurrent-write  model 
of  parallel  computation  several  processors  may  attempt  to  write  at  the  same  memory 
location  only  if  they  are  seeking  to  write  the  same  value.  This  results  in  this 
value  being  written  into  the  memory  location.  There  was  exactly  one  place  in  the 
algorithm  where  we  used  the  stronger  model.  It  was  in  the  terminal  stage  of  Step 
1.  We  need  the  following  problem  for  our  discussion.  Input.  A  vector  of  p  bits. 
Find  the  minimal  index  of  the  vector  whose  bit  is  1  using  p  processors.  [FRW-83] 
proposed  the  following  0(1)  time  algorithm  for  the  problem  in  the  weaker 
concurrent-write  model  of  computation:  Partition  the  input  vector  into  [/p] 
successive  sub-vectors  each  of  length  [/ p  ]  (or  |>^p|).  For  each  such  sub-vector, 
find,  in  0(1)  time  using  0(/p)  processors,  if  it  has  a  one.  Apply  the  0(1)  time 
algorithm  of  [SV-81]  for  finding  the  minimum  among  tliese  [/ p  ]  sub-vectors  using  p 
processors  in  the  weaker  model  of  computation.  Reapply  this  algorithm  for  finding 
the  index  of  minimum  one  in  this  sub-vector.  Using  this  algorithm  we  can  simulate 
the  string  matching  algorithm,  which  was  given  in  the  stronger  concurrent-write 
model,  by  the  weaker  concurrent-write  model  within  the  same  bounds  for  time  and 
number  of  processors. 

Consider  another  problem.  Input.  A  vector  of  I  bits.  Compute  the  OR  of 
these  bits  in  a  concurrent-read  exclusive-write  PRAM.  We  use  a  balanced  binary 
tree  with  i  leaves  to  guide  the  computation.  The  number  of  operations  of  this 
trivial  algorithm  is  proportional  to  the  number  of  nodes  in  the  tree  and  its  time 
is  proportional  to  its  height.  That  is,  0(£  )  operations  and  O(log  i)  time.  Apply 
Brent's  theorem  to  get  the  0(£/p)  time  using  any  number  of  p  <  ^ /log  i  processors. 
Using  this  algorithm  we  can  run  our  algorithm  on  a  concurrent-read  exclusive-write 
PRAM  in  time  0(n/p)  using  any  number  of  p  <  n/log"n  processors.  Using  this 
algorithm  we  can  run  the  text  analysis  part  of  our  algorithm  on  a  concurrent-read 
exclusive-write  PRAM  in  time  0(n/p)  using  any  number  of  p  <  n/log  n  processors. 
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Conclusion.  We  presented  a  new  linear  time  serial  algorithm  for  the   string 

matching  problem  In  which  the  analysis  of  the  text  Is  particularly  simple.   The 

algorithm  Is  parallel  linear  for  a  very  wide  range  for  the  number  of  processors. 
The  exact  range  depends  on  the  model  of  computation  being  used. 


-21- 

ACKNOWLEDGEMENT .  I  am  grateful  to  Zvi  Galil  for  encouraging  me  to  continue 
Improving  the  results  In  this  paper  and  for  quite  a  few  Insights  through  both 
discussions  and  his  paper.  Helpful  comments  by  Dennis  Shasha  are  also  gratefully 
acknowledged.  "   >-  . 

REFERENCES 


[AHU-74]  A.V.  Aho,  J.E.  Hopcroft  and  J.D.  Ullman,  The  Design  and  Analysis  of 
Computer  Algorithms,  Addlson-Wesley ,  Reading,  MA,  1974. 

[AKS-83]  M.  Ajtal,  J.  Komlos,  and  E.  Szemeredl,  "An  0(n  log  n)  sorting  network," 
Combinatorica  3,1  (1983),  1-19. 

[Ba-78]  T.P.  Baker,  "A  technique  for  extending  rapid  exact-match  string  matching 
to  arrays  of  more  than  one  dimention",  SIAM  J.  Comput .  7,4  (1978),  533-541. 

[BM-77]  R.S.  Boyer  and  J.S.  Moore,  "A  fast  string  searching  algorithm",  Comm. 
ACM  20(1977),  762-772. 

[BV-84]  I.  Bar-On  and  U.  Vishkin,  "Optimal  parallel  generation  of  a  computation 
tree  form",  Proc.   1984  International  Conf.   on  Parallel  Processing,  490-495. 

[CLC-81]  F.Y.  Chin,  J.  Lara  and  I.  Chen,  "Optimal  parallel  algorithms  for  the 
connected  component  problems,"  Proc.  1981  International  Conf.  on  Parallel 
Processing  (1981),  170-175. 

[G-84]  Z.  Gain,  "Optimal  parallel  algorithms  for  string  matching",  Proc.  16th 
ACM  Symp.   on  Theory  of  Computing,  1984,  240-248. 

[FRW-83]  F.E.  Flch,  R.L.  Ragde  and  A.  Wigderson,  "Relation  between 
concurrent-write  models  of  papallel  computation",  preprint,  Div.  of  Computer 
Science,  Univ.-  of  Calif.,  Berkeley,  1983. 

[KMP-77]  D.E.  Knuth,  J.H.  Morris  and  V.R.  Pratt,  "Fast  pattern  matching  in 
strings",  SI.^  J.  Comp.  6  (1977),  322-350. 

[LS-62]  R.C.  Lyndon  and  M.P.  Schutzenberger ,  "The  equation  a  =  .b  c  in  a  free 
group",  Michigan  Math.   J_^  9  (1962),  289-298. 

[PVW-83]  W.  Paul,  U.  Vishkin  and  H.  Wagener,  "parallel  dictionaries  on  2-3  trees", 
Proc.  10th  ICALP,  Lecture  Notes  in  Computer  Science  154,  Springer-Verlag, 
1983,  597^09 — 


-22- 

[RV-83]   J.H.    Reif   and   L.G.   Valiant,  "A  logarithmic  time  sort  for  linear  size 
networks",  Proc.   i5th  Annual  ACM  Symp.   on  Theory  of  Computing  (1983),  10-16. 

[SV-81]  Y.  Shiloach  and  U.  Vishkin,  "Finding  the  maximum  merging,  and  sorting  in  a 
parallel  computation  model,"  J^  Algorithms  2  (1981),  88-102. 

[TC-84]   Y.H.    Tsin  and  F.Y.   Chin,  "Efficient  parallel  algorithms  for  a  class  of 
graph  theoretic  problems",  SI AM  J.  Comput.  13  (1984),  580-599. 


[TV-83]  R.E.  Tarjan  and  U.  Vishkin,  "An  efficient  parallel  biconnectivity 
algorithm",  TR  69,  Dept.  of  Computer  Science,  Courant  Institute,  NYU,  1983. 
To  appear  in  SIAM  J.  Comput . . 

[Vi-81]  U.  Vishkin,  "An  optimal  parallel  connectivity  algorithm,"  Technical  Report 
RC  9149,  IBM  Thomas  J.  Watson  Research  Center,  Yorktown  Heights,  New  York 
1981.   To  appear  in  Discrete  Applied  Mathematics. 

[Vi-83a]  U.  Vishkin,  "Synchronous  parallel  computation  -  a  survey",  TR  71,  Dept  of 
Computer  Science,  Courant  Institute,  NYU,  1983. 

[Vi-83b]  U.  Vishkin,  "An  optimal  parallel  algorithm  for  selection",  preprint, 
1983. 


[Vi-84  ]  U.  Vishkin,  "Randomized  speed-ups  in  parallel  computation",   Proc.    16th 
Annual  ACM  Symp.   on  Theory  of  Computing  (1984),  230-239. 


w 


PATTERN 


PATTERN 


TEXT 


h-h^ 


jj^-l+W 


Figure  1.   A  duel  between  positions  j   and  j   of  the  text. 

-  PATTERN  (w)  'f   PATTERN  (jj^-J2+w) 

-  If  TEXT(j  -1+v)  ^   PATTERN(w)  then  there  is  no  occurrence  of 
the  pattern   at   j  ^  . 

-  If  TEXT(j  -1+w)  ^  PATTERN (j,-J2+w)  then  there  is  no  occurrence  of 
the  pattern  at  j  . 
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Figure  %.      Step  1. 
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Figure  3.   The  buffers  for  the  tenninal  stage  of  Step  1. 
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